Este es un análisis exploratorio de datos con fines didácticos. El análisis es realizado con datos relacionados con Covid-19 reportados por diversos países. Los datos son obtenidos del sitio: https://ourworldindata.org/coronavirus-source-data. El conjunto de datos es actualizado constantemente. Por lo tanto, una ventaja de descargarlo directamente del sitio es obtener los datos actualizados.
Hecho por: Oscar Castro
Fecha de ejecución del codigo:
print(format(Sys.Date(), "%A, %d de %B de %Y"))
## [1] "sábado, 08 de agosto de 2020"
covid.df <- read.csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
Cargamos los paquetes necesarios:
library(ggplot2)
library(knitr)
library(reshape2)
library(kableExtra)
str(covid.df)
## 'data.frame': 35507 obs. of 36 variables:
## $ iso_code : Factor w/ 212 levels "","ABW","AFG",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ continent : Factor w/ 7 levels "","Africa","Asia",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ location : Factor w/ 212 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ date : Factor w/ 222 levels "2019-12-31","2020-01-01",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ total_cases : num 0 0 0 0 0 0 0 0 0 0 ...
## $ new_cases : num 0 0 0 0 0 0 0 0 0 0 ...
## $ total_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
## $ new_deaths : num 0 0 0 0 0 0 0 0 0 0 ...
## $ total_cases_per_million : num 0 0 0 0 0 0 0 0 0 0 ...
## $ new_cases_per_million : num 0 0 0 0 0 0 0 0 0 0 ...
## $ total_deaths_per_million : num 0 0 0 0 0 0 0 0 0 0 ...
## $ new_deaths_per_million : num 0 0 0 0 0 0 0 0 0 0 ...
## $ new_tests : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_tests : num NA NA NA NA NA NA NA NA NA NA ...
## $ total_tests_per_thousand : num NA NA NA NA NA NA NA NA NA NA ...
## $ new_tests_per_thousand : num NA NA NA NA NA NA NA NA NA NA ...
## $ new_tests_smoothed : num NA NA NA NA NA NA NA NA NA NA ...
## $ new_tests_smoothed_per_thousand: num NA NA NA NA NA NA NA NA NA NA ...
## $ tests_per_case : num NA NA NA NA NA NA NA NA NA NA ...
## $ positive_rate : num NA NA NA NA NA NA NA NA NA NA ...
## $ tests_units : Factor w/ 6 levels "","people tested",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ stringency_index : num NA 0 0 0 0 0 0 0 0 0 ...
## $ population : num 38928341 38928341 38928341 38928341 38928341 ...
## $ population_density : num 54.4 54.4 54.4 54.4 54.4 ...
## $ median_age : num 18.6 18.6 18.6 18.6 18.6 18.6 18.6 18.6 18.6 18.6 ...
## $ aged_65_older : num 2.58 2.58 2.58 2.58 2.58 ...
## $ aged_70_older : num 1.34 1.34 1.34 1.34 1.34 ...
## $ gdp_per_capita : num 1804 1804 1804 1804 1804 ...
## $ extreme_poverty : num NA NA NA NA NA NA NA NA NA NA ...
## $ cardiovasc_death_rate : num 597 597 597 597 597 ...
## $ diabetes_prevalence : num 9.59 9.59 9.59 9.59 9.59 9.59 9.59 9.59 9.59 9.59 ...
## $ female_smokers : num NA NA NA NA NA NA NA NA NA NA ...
## $ male_smokers : num NA NA NA NA NA NA NA NA NA NA ...
## $ handwashing_facilities : num 37.7 37.7 37.7 37.7 37.7 ...
## $ hospital_beds_per_thousand : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
## $ life_expectancy : num 64.8 64.8 64.8 64.8 64.8 ...
La descripción de cada una de las variables de este conjunto de datos se encuentra en el sitio: https://github.com/owid/covid-19-data/blob/master/public/data/owid-covid-data-codebook.md.
Ajuste de la columna date que es tipo factor a tipo date:
covid.df$date <- as.Date(covid.df$date, format("%Y-%m-%d"))
levels(covid.df$location)
## [1] "Afghanistan" "Albania"
## [3] "Algeria" "Andorra"
## [5] "Angola" "Anguilla"
## [7] "Antigua and Barbuda" "Argentina"
## [9] "Armenia" "Aruba"
## [11] "Australia" "Austria"
## [13] "Azerbaijan" "Bahamas"
## [15] "Bahrain" "Bangladesh"
## [17] "Barbados" "Belarus"
## [19] "Belgium" "Belize"
## [21] "Benin" "Bermuda"
## [23] "Bhutan" "Bolivia"
## [25] "Bonaire Sint Eustatius and Saba" "Bosnia and Herzegovina"
## [27] "Botswana" "Brazil"
## [29] "British Virgin Islands" "Brunei"
## [31] "Bulgaria" "Burkina Faso"
## [33] "Burundi" "Cambodia"
## [35] "Cameroon" "Canada"
## [37] "Cape Verde" "Cayman Islands"
## [39] "Central African Republic" "Chad"
## [41] "Chile" "China"
## [43] "Colombia" "Comoros"
## [45] "Congo" "Costa Rica"
## [47] "Cote d'Ivoire" "Croatia"
## [49] "Cuba" "Curacao"
## [51] "Cyprus" "Czech Republic"
## [53] "Democratic Republic of Congo" "Denmark"
## [55] "Djibouti" "Dominica"
## [57] "Dominican Republic" "Ecuador"
## [59] "Egypt" "El Salvador"
## [61] "Equatorial Guinea" "Eritrea"
## [63] "Estonia" "Ethiopia"
## [65] "Faeroe Islands" "Falkland Islands"
## [67] "Fiji" "Finland"
## [69] "France" "French Polynesia"
## [71] "Gabon" "Gambia"
## [73] "Georgia" "Germany"
## [75] "Ghana" "Gibraltar"
## [77] "Greece" "Greenland"
## [79] "Grenada" "Guam"
## [81] "Guatemala" "Guernsey"
## [83] "Guinea" "Guinea-Bissau"
## [85] "Guyana" "Haiti"
## [87] "Honduras" "Hong Kong"
## [89] "Hungary" "Iceland"
## [91] "India" "Indonesia"
## [93] "International" "Iran"
## [95] "Iraq" "Ireland"
## [97] "Isle of Man" "Israel"
## [99] "Italy" "Jamaica"
## [101] "Japan" "Jersey"
## [103] "Jordan" "Kazakhstan"
## [105] "Kenya" "Kosovo"
## [107] "Kuwait" "Kyrgyzstan"
## [109] "Laos" "Latvia"
## [111] "Lebanon" "Lesotho"
## [113] "Liberia" "Libya"
## [115] "Liechtenstein" "Lithuania"
## [117] "Luxembourg" "Macedonia"
## [119] "Madagascar" "Malawi"
## [121] "Malaysia" "Maldives"
## [123] "Mali" "Malta"
## [125] "Mauritania" "Mauritius"
## [127] "Mexico" "Moldova"
## [129] "Monaco" "Mongolia"
## [131] "Montenegro" "Montserrat"
## [133] "Morocco" "Mozambique"
## [135] "Myanmar" "Namibia"
## [137] "Nepal" "Netherlands"
## [139] "New Caledonia" "New Zealand"
## [141] "Nicaragua" "Niger"
## [143] "Nigeria" "Northern Mariana Islands"
## [145] "Norway" "Oman"
## [147] "Pakistan" "Palestine"
## [149] "Panama" "Papua New Guinea"
## [151] "Paraguay" "Peru"
## [153] "Philippines" "Poland"
## [155] "Portugal" "Puerto Rico"
## [157] "Qatar" "Romania"
## [159] "Russia" "Rwanda"
## [161] "Saint Kitts and Nevis" "Saint Lucia"
## [163] "Saint Vincent and the Grenadines" "San Marino"
## [165] "Sao Tome and Principe" "Saudi Arabia"
## [167] "Senegal" "Serbia"
## [169] "Seychelles" "Sierra Leone"
## [171] "Singapore" "Sint Maarten (Dutch part)"
## [173] "Slovakia" "Slovenia"
## [175] "Somalia" "South Africa"
## [177] "South Korea" "South Sudan"
## [179] "Spain" "Sri Lanka"
## [181] "Sudan" "Suriname"
## [183] "Swaziland" "Sweden"
## [185] "Switzerland" "Syria"
## [187] "Taiwan" "Tajikistan"
## [189] "Tanzania" "Thailand"
## [191] "Timor" "Togo"
## [193] "Trinidad and Tobago" "Tunisia"
## [195] "Turkey" "Turks and Caicos Islands"
## [197] "Uganda" "Ukraine"
## [199] "United Arab Emirates" "United Kingdom"
## [201] "United States" "United States Virgin Islands"
## [203] "Uruguay" "Uzbekistan"
## [205] "Vatican" "Venezuela"
## [207] "Vietnam" "Western Sahara"
## [209] "World" "Yemen"
## [211] "Zambia" "Zimbabwe"
levels(covid.df$continent)
## [1] "" "Africa" "Asia" "Europe"
## [5] "North America" "Oceania" "South America"
maxdays <- max(table(covid.df$location))
print(paste("Días máximos reportados:",maxdays))
## [1] "Días máximos reportados: 222"
names(table(covid.df$location)[table(covid.df$location)==maxdays])
## [1] "Australia" "Austria" "Belarus"
## [4] "Belgium" "Brazil" "Canada"
## [7] "China" "Croatia" "Czech Republic"
## [10] "Denmark" "Estonia" "Finland"
## [13] "France" "Germany" "Greece"
## [16] "Iceland" "Iran" "Israel"
## [19] "Italy" "Japan" "Lithuania"
## [22] "Luxembourg" "Malaysia" "Mexico"
## [25] "Nepal" "Netherlands" "Norway"
## [28] "Russia" "Singapore" "South Korea"
## [31] "Sweden" "Switzerland" "Taiwan"
## [34] "Thailand" "United Arab Emirates" "United Kingdom"
## [37] "United States" "Vietnam" "World"
startdate <- min(covid.df$date)
enddate <- max(covid.df$date)
print(paste("Inicia el", format(startdate, "%d de %B de %Y"),
"y termina el", format(enddate, "%d de %B de %Y")))
## [1] "Inicia el 31 de diciembre de 2019 y termina el 08 de agosto de 2020"
Filtramos nuestro dataframe original, sólo nos quedamos con las observaciones en donde la columna date sea igual a la fecha máxima. De esta manera tendremos los totales o acumulados más actualizados de cada país. Adicionalmente, eliminamos los países con menos de 1 millón de habitantes.
Analizaremos los rankings de países con las siguientes variables:
# Get one row of each country with the most updated totals
covid.total.df <- covid.df[covid.df$date== enddate,]
# Filter countries with less than 1 millón population
covid.total.df <- covid.total.df[covid.total.df$population >= 1000000,]
print(names(covid.total.df)[c(5,7,9,11)])
## [1] "total_cases" "total_deaths"
## [3] "total_cases_per_million" "total_deaths_per_million"
Para obtener el top 20 ordenamos los datos con la columna total_cases. Guardamos sólo los 20 países con más casos y los 20 países con menos casos. Finalmente, filtramos sólo las columnas que nos interesan. Y mostramos los datos en una tabla y en una gráfica de Barras horizontales.
# Ordenamos los datos de acuerdo a la columna total_cases de manera descendente
ranking.total_cases <- covid.total.df[order(-covid.total.df$total_cases),]
# Quitamos los datos de World, para solo quedarnos con datos de países
ranking.total_cases <- ranking.total_cases[ranking.total_cases$location != "World", ]
# Agregamos una nueva columna con el índice numérico de la posición del pais en el top
ranking.total_cases$position <- 1:nrow(ranking.total_cases)
# Nos quedamos sólo con los primeros 20 países y las columnas que nos interesan
columnfilter <- c("position", "location", "total_cases")
bottom20.total_cases <- tail(ranking.total_cases[, columnfilter],20)
top20.total_cases <- head(ranking.total_cases[, columnfilter],20)
rm(ranking.total_cases)
rownames(top20.total_cases) <- c()
rownames(bottom20.total_cases) <- c()
mexrow <- which(top20.total_cases$location=='Mexico')
tablecolnames <- c("Posición", "País", "Casos Totales")
top20.total_cases$total_cases_formated <- formatC(top20.total_cases$total_cases,
format="f", big.mark=",", digits=0)
bottom20.total_cases$total_cases_formated <- formatC(bottom20.total_cases$total_cases,
format="f", big.mark=",", digits=0)
kable(top20.total_cases[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Top 20 - Países con más casos") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_cases[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Top 20 - Países con menos casos") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
| Posición | País | Casos Totales |
|---|---|---|
| 1 | United States | 4,941,796 |
| 2 | Brazil | 2,962,442 |
| 3 | India | 2,088,611 |
| 4 | Russia | 877,135 |
| 5 | South Africa | 545,476 |
| 6 | Mexico | 469,407 |
| 7 | Peru | 463,875 |
| 8 | Chile | 368,825 |
| 9 | Colombia | 367,196 |
| 10 | Iran | 322,567 |
| 11 | United Kingdom | 309,005 |
| 12 | Saudi Arabia | 285,793 |
| 13 | Pakistan | 283,487 |
| 14 | Bangladesh | 252,502 |
| 15 | Italy | 249,756 |
| 16 | Turkey | 238,450 |
| 17 | Argentina | 228,182 |
| 18 | Germany | 215,336 |
| 19 | France | 197,921 |
| 20 | Iraq | 144,064 |
| Posición | País | Casos Totales |
|---|---|---|
| 136 | Gambia | 1,090 |
| 137 | Syria | 1,060 |
| 138 | Togo | 1,028 |
| 139 | Jamaica | 987 |
| 140 | Chad | 942 |
| 141 | Botswana | 909 |
| 142 | Vietnam | 789 |
| 143 | Lesotho | 742 |
| 144 | Tanzania | 509 |
| 145 | Taiwan | 479 |
| 146 | Burundi | 405 |
| 147 | Myanmar | 359 |
| 148 | Mauritius | 344 |
| 149 | Mongolia | 293 |
| 150 | Eritrea | 285 |
| 151 | Cambodia | 246 |
| 152 | Trinidad and Tobago | 225 |
| 153 | Papua New Guinea | 188 |
| 154 | Timor | 25 |
| 155 | Laos | 20 |
ggplot(data=top20.total_cases, aes(x=reorder(paste(position, location),total_cases),
y=total_cases, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Número de Casos Reportados de COVID-19") +
geom_text(aes(y=max(total_cases)+150000, label=total_cases_formated), color="black")+
labs(title="Top 20 países con más casos reportados de COVID-19")+
scale_y_continuous(breaks=c(100000, 250000, 500000, 750000, 1000000, 2500000, 4000000, 5000000),
label=c("100k", "250k", "500k", "750k", "1m", "2.5m", "4m", "5m"))+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
Obtenemos los rankings pero ahora para el continente americano. Filtramos los datos por la columna continent de “Nort America” y “South America”.
ranking.total_cases.america <- covid.total.df[covid.total.df$continent %in%
c("North America", "South America") ,]
ranking.total_cases.america <- ranking.total_cases.america[order(-ranking.total_cases.america$total_cases),]
ranking.total_cases.america$position <- 1:nrow(ranking.total_cases.america)
bottom20.total_cases.america <- tail(ranking.total_cases.america[, c("position", "location", "total_cases")],5)
top20.total_cases.america <- head(ranking.total_cases.america[, c("position", "location", "total_cases")],20)
rownames(bottom20.total_cases.america) <- c()
rownames(top20.total_cases.america) <- c()
mexrow <- which(ranking.total_cases.america$location=='Mexico')
rm(ranking.total_cases.america)
top20.total_cases.america$total_cases_formated <- formatC(top20.total_cases.america$total_cases,
format="f", big.mark=",", digits=0)
bottom20.total_cases.america$total_cases_formated <- formatC(bottom20.total_cases.america$total_cases,
format="f", big.mark=",", digits=0)
tablecolnames <- c("Posición", "País", "Casos Totales")
kable(top20.total_cases.america[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Top 20 América - Países con más casos") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "center") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
| Posición | País | Casos Totales |
|---|---|---|
| 1 | United States | 4,941,796 |
| 2 | Brazil | 2,962,442 |
| 3 | Mexico | 469,407 |
| 4 | Peru | 463,875 |
| 5 | Chile | 368,825 |
| 6 | Colombia | 367,196 |
| 7 | Argentina | 228,182 |
| 8 | Canada | 118,970 |
| 9 | Ecuador | 91,969 |
| 10 | Bolivia | 87,891 |
| 11 | Dominican Republic | 77,709 |
| 12 | Panama | 72,560 |
| 13 | Guatemala | 55,270 |
| 14 | Honduras | 46,365 |
| 15 | Venezuela | 24,166 |
| 16 | Costa Rica | 22,081 |
| 17 | Puerto Rico | 20,686 |
| 18 | El Salvador | 19,544 |
| 19 | Haiti | 7,599 |
| 20 | Paraguay | 6,508 |
ggplot(data=top20.total_cases.america, aes(x=reorder(paste(position, location),total_cases),
y=total_cases, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Número de Casos Reportados de COVID-19") +
geom_text(aes(y=max(total_cases)+150000, label=format(total_cases, big.mark=",")), color="black")+
labs(title="Top 20 países con más casos reportados de COVID-19 de América")+
scale_y_continuous(breaks=c(100000, 250000, 500000, 750000, 1000000, 2500000, 4000000, 5000000),
label=c("100k", "250k", "500k", "750k", "1m", "2.5m", "4m", "5m"))+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
ranking.total_cases_per_million <- covid.total.df[order(-covid.total.df$total_cases_per_million),]
ranking.total_cases_per_million$position <- 1:nrow(ranking.total_cases_per_million)
columnfilter <- c("position", "location", "total_cases_per_million")
mexico.total_cases_per_million <- ranking.total_cases_per_million[
ranking.total_cases_per_million$location == "México", ]
bottom20.total_cases_per_million <- tail(ranking.total_cases_per_million[, columnfilter],20)
top20.total_cases_per_million <- head(ranking.total_cases_per_million[, columnfilter],20)
mexico.total_cases_per_million <- mexico.total_cases_per_million[, columnfilter]
rm(ranking.total_cases_per_million)
rownames(top20.total_cases_per_million) <- c()
rownames(bottom20.total_cases_per_million) <- c()
rownames(mexico.total_cases_per_million) <- c()
top20.total_cases_per_million <- rbind(top20.total_cases_per_million, mexico.total_cases_per_million)
mexrow <- which(top20.total_cases_per_million$location=='Mexico')
tablecolnames <- c("Posición", "País", "Casos")
top20.total_cases_per_million$total_cases_per_million_formated <- formatC(
top20.total_cases_per_million$total_cases_per_million, format="f", big.mark=",", digits=2)
bottom20.total_cases_per_million$total_cases_per_million_formated<- formatC(
bottom20.total_cases_per_million$total_cases_per_million, format="f", big.mark=",", digits=2)
kable(top20.total_cases_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Top 20 - Países con más casos por millón de habitantes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_cases_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Top 20 - Países con menos casos por millón de habitantes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
| Posición | País | Casos |
|---|---|---|
| 1 | Qatar | 39,007.52 |
| 2 | Bahrain | 25,451.01 |
| 3 | Chile | 19,293.84 |
| 4 | Panama | 16,816.66 |
| 5 | Kuwait | 16,561.52 |
| 6 | Oman | 15,874.88 |
| 7 | United States | 14,929.78 |
| 8 | Peru | 14,068.82 |
| 9 | Brazil | 13,937.01 |
| 10 | Armenia | 13,459.96 |
| 11 | Singapore | 9,366.46 |
| 12 | Israel | 9,357.13 |
| 13 | South Africa | 9,197.24 |
| 14 | Saudi Arabia | 8,209.17 |
| 15 | Sweden | 8,151.38 |
| 16 | Bolivia | 7,529.41 |
| 17 | Belarus | 7,261.26 |
| 18 | Puerto Rico | 7,230.74 |
| 19 | Colombia | 7,216.49 |
| 20 | Dominican Republic | 7,163.50 |
| Posición | País | Casos |
|---|---|---|
| 137 | Eritrea | 80.36 |
| 138 | Mozambique | 70.80 |
| 139 | China | 61.54 |
| 140 | Syria | 60.57 |
| 141 | Yemen | 60.22 |
| 142 | Chad | 57.35 |
| 143 | Burkina Faso | 56.21 |
| 144 | Thailand | 47.97 |
| 145 | Niger | 47.63 |
| 146 | Angola | 45.12 |
| 147 | Burundi | 34.06 |
| 148 | Uganda | 27.41 |
| 149 | Papua New Guinea | 21.01 |
| 150 | Taiwan | 20.11 |
| 151 | Timor | 18.96 |
| 152 | Cambodia | 14.71 |
| 153 | Tanzania | 8.52 |
| 154 | Vietnam | 8.11 |
| 155 | Myanmar | 6.60 |
| 156 | Laos | 2.75 |
ggplot(data=top20.total_cases_per_million, aes(x=reorder(paste(position, location),total_cases_per_million),
y=total_cases_per_million, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Número de Casos Reportados de COVID-19") +
geom_text(aes(y=max(total_cases_per_million)+1250, label=total_cases_per_million_formated), color="black")+
labs(title="Top 20 países con más casos reportados de COVID-19 por millón de habitantes (+ México)")+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
Para obtener el top 20 ordenamos los datos con la columna total_deaths. Guardamos sólo los 20 países con más casos y los 20 países con menos casos. Filtramos sólo las columnas que nos interesan. Finalmente, mostramos los datos en una tabla y en una gráfica de Barras horizontales.
ranking.total_deaths <- covid.total.df[order(-covid.total.df$total_deaths),]
ranking.total_deaths <- ranking.total_deaths[ranking.total_deaths$location != "World", ]
ranking.total_deaths$position <- 1:nrow(ranking.total_deaths)
columnfilter <- c("position", "location", "total_deaths")
bottom20.total_deaths <- tail(ranking.total_deaths[, columnfilter],20)
top20.total_deaths <- head(ranking.total_deaths[, columnfilter],20)
mexrow <- which(ranking.total_deaths$location=='Mexico')
rm(ranking.total_deaths)
rownames(top20.total_deaths) <- c()
rownames(bottom20.total_deaths) <- c()
tablecolnames <- c("Posición", "País", "Muertes")
top20.total_deaths$total_deaths_formated <- formatC(top20.total_deaths$total_deaths,
format="f", big.mark=",", digits=0)
bottom20.total_deaths$total_deaths_formated <- formatC(bottom20.total_deaths$total_deaths,
format="f", big.mark=",", digits=0)
kable(top20.total_deaths[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Top 20 - Países con más muertes atribuibles a COVID-19") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_deaths[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames, caption="Top 20 - Países con menos muertes atribuibles a COVID-19") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
| Posición | País | Muertes |
|---|---|---|
| 1 | United States | 161,356 |
| 2 | Brazil | 99,572 |
| 3 | Mexico | 51,311 |
| 4 | United Kingdom | 46,511 |
| 5 | India | 42,518 |
| 6 | Italy | 35,190 |
| 7 | France | 30,324 |
| 8 | Peru | 20,649 |
| 9 | Iran | 18,132 |
| 10 | Russia | 14,725 |
| 11 | Colombia | 12,250 |
| 12 | Chile | 9,958 |
| 13 | South Africa | 9,909 |
| 14 | Belgium | 9,866 |
| 15 | Germany | 9,195 |
| 16 | Canada | 8,970 |
| 17 | Netherlands | 6,154 |
| 18 | Pakistan | 6,068 |
| 19 | Ecuador | 5,897 |
| 20 | Turkey | 5,813 |
| Posición | País | Muertes |
|---|---|---|
| 136 | Namibia | 16 |
| 137 | Mozambique | 15 |
| 138 | Jamaica | 13 |
| 139 | Jordan | 11 |
| 140 | Sri Lanka | 11 |
| 141 | Mauritius | 10 |
| 142 | Vietnam | 10 |
| 143 | Trinidad and Tobago | 8 |
| 144 | Taiwan | 7 |
| 145 | Myanmar | 6 |
| 146 | Uganda | 6 |
| 147 | Rwanda | 5 |
| 148 | Papua New Guinea | 3 |
| 149 | Botswana | 2 |
| 150 | Burundi | 1 |
| 151 | Cambodia | 0 |
| 152 | Eritrea | 0 |
| 153 | Laos | 0 |
| 154 | Mongolia | 0 |
| 155 | Timor | 0 |
ggplot(data=top20.total_deaths, aes(x=reorder(paste(position, location),total_deaths),
y=total_deaths, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Número de muertes atribuibles a COVID-19") +
geom_text(aes(y=max(total_deaths)+7500, label=total_deaths_formated), color="black")+
labs(title="Top 20 países con más muertes atribuibles a COVID-19")+
scale_y_continuous(breaks=c(10000, 25000, 50000, 75000, 100000, 150000, 170000),
label=c("10k", "25k", "50k", "75k", "100k", "150k", "170k"))+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
Para obtener el top 20 ordenamos los datos con la columna total_deaths_per_million. Guardamos sólo los 20 países con más casos y los 20 países con menos casos. Filtramos sólo las columnas que nos interesan. Finalmente, mostramos los datos en una tabla y en una gráfica de barras horizontales.
ranking.total_deaths_per_million <- covid.total.df[order(-covid.total.df$total_deaths_per_million),]
ranking.total_deaths_per_million$position <- 1:nrow(ranking.total_deaths_per_million)
columnfilter <- c("position", "location", "total_deaths_per_million")
bottom20.total_deaths_per_million <- tail(ranking.total_deaths_per_million[, columnfilter],20)
top20.total_deaths_per_million <- head(ranking.total_deaths_per_million[, columnfilter],20)
mexrow <- which(ranking.total_deaths_per_million$location=='Mexico')
rm(ranking.total_deaths_per_million)
rownames(top20.total_deaths_per_million) <- c()
rownames(bottom20.total_deaths_per_million) <- c()
tablecolnames <- c("Posición", "País", "Muertes")
top20.total_deaths_per_million$total_deaths_per_million_formated <- formatC(
top20.total_deaths_per_million$total_deaths_per_million, format="f", big.mark=",", digits=2)
bottom20.total_deaths_per_million$total_deaths_per_million_formated <- formatC(
bottom20.total_deaths_per_million$total_deaths_per_million, format="f", big.mark=",", digits=2)
kable(top20.total_deaths_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames,
caption="Top 20 - Países con más muertes atribuibles a COVID-19 por millón de habitantes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "float_left") %>%
row_spec(mexrow, bold = T, color = "black", background = "yellow")
kable(bottom20.total_deaths_per_million[,c(1,2,4)], format = "html", align = "clr",
col.names=tablecolnames,
caption="Top 20 - Países con menos muertes atribuibles a COVID-19 por millón de habitantes") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, position = "right")
| Posición | País | Muertes |
|---|---|---|
| 1 | Belgium | 851.28 |
| 2 | United Kingdom | 685.13 |
| 3 | Peru | 626.26 |
| 4 | Italy | 582.02 |
| 5 | Sweden | 570.63 |
| 6 | Chile | 520.92 |
| 7 | United States | 487.48 |
| 8 | Brazil | 468.44 |
| 9 | France | 464.57 |
| 10 | Mexico | 397.97 |
| 11 | Panama | 368.73 |
| 12 | Netherlands | 359.15 |
| 13 | Ireland | 358.87 |
| 14 | Ecuador | 334.24 |
| 15 | Bolivia | 301.89 |
| 16 | Armenia | 262.21 |
| 17 | Macedonia | 249.11 |
| 18 | Colombia | 240.75 |
| 19 | Canada | 237.66 |
| 20 | Kyrgyzstan | 223.78 |
| Posición | País | Muertes |
|---|---|---|
| 137 | Nepal | 2.40 |
| 138 | Angola | 1.95 |
| 139 | Jordan | 1.08 |
| 140 | Botswana | 0.85 |
| 141 | Thailand | 0.83 |
| 142 | Sri Lanka | 0.51 |
| 143 | Mozambique | 0.48 |
| 144 | Rwanda | 0.39 |
| 145 | Tanzania | 0.35 |
| 146 | Papua New Guinea | 0.34 |
| 147 | Taiwan | 0.29 |
| 148 | Uganda | 0.13 |
| 149 | Myanmar | 0.11 |
| 150 | Vietnam | 0.10 |
| 151 | Burundi | 0.08 |
| 152 | Cambodia | 0.00 |
| 153 | Eritrea | 0.00 |
| 154 | Laos | 0.00 |
| 155 | Mongolia | 0.00 |
| 156 | Timor | 0.00 |
ggplot(data=top20.total_deaths_per_million, aes(x=reorder(paste(position, location),total_deaths_per_million),
y=total_deaths_per_million, fill=location))+
geom_bar(stat = "identity", position=position_dodge(), colour="black", show.legend = FALSE)+
ylab("Número de muertes atribuibles a COVID-19 por millón de habitantes") +
geom_text(aes(y=max(total_deaths_per_million)+25, label=total_deaths_per_million_formated), color="black")+
labs(title="Top 20 países con más muertes atribuibles a COVID-19 por millón de habitantes")+
coord_flip() +
xlab("Países") +
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"))
Gráfica creada desde la fecha inicial del conjunto de datos:
plot.trend.new_cases <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle)+
ylab("Nuevos casos de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
top5.total_cases <- c(head(as.character(top20.total_cases$location),5), "México")
plot.trend.new_cases(startdate, enddate, top5.total_cases, "Tendencia de contagios de Covid-19 por países de enero a agosto")
Podemos observar que las líneas empiezan a subir a partir de marzo. Modificamos la gráfica con diversas fechas de inicio. Por ejemplo, con los datos a partir de marzo:
plot.trend.new_cases("2020-03-01", enddate, top5.total_cases, "Tendencia de contagios de Covid-19 por países de marzo a agosto")
plot.trend.new_cases("2020-06-01", enddate, top5.total_cases, "Tendencia de contagios de Covid-19 por países de junio a agosto")
plot.trend.new_cases("2020-07-01", enddate, top5.total_cases, "Tendencia de contagios de Covid-19 por países de julio a agosto")
plot.trend.new_cases <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle)+
ylab("Nuevos casos de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
plot.trend.new_cases(enddate-14, enddate, top5.total_cases, "Tendencia de contagios de Covid-19 por países de las últimas dos semanas")
Gráfica creada desde la fecha inicial del conjunto de datos:
plot.trend.new_deaths <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle)+
ylab("Nuevos muertes atribuibles a COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
top5.total_cases <- c(head(as.character(top20.total_cases$location),5), "México")
plot.trend.new_deaths(startdate, enddate, top5.total_cases, "Tendencia de nuevas muertes atribuidas a Covid-19 por países de enero a agosto")
Gráfica con los datos a partir de marzo:
plot.trend.new_deaths("2020-03-01", enddate, top5.total_cases, "Tendencia de nuevas muertes atribuidas a Covid-19 de marzo a agosto")
plot.trend.new_deaths("2020-06-01", enddate, top5.total_cases, "Tendencia de nuevas muertes atribuidas a Covid-19 por países de junio a agosto")
plot.trend.new_deaths("2020-07-01", enddate, top5.total_cases, "Tendencia de nuevas muertes atribuidas a Covid-19 por países de julio a agosto")
plot.trend.new_deaths <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=new_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle)+
ylab("Nuevos casos de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
plot.trend.new_deaths(enddate-14, enddate, top5.total_cases, "Tendencia de nuevas muertes atribuidas a Covid-19 por países de las últimas dos semanas")
plot.trend.total_cases <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle)+
ylab("Casos acumulados de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
plot.trend.total_cases(startdate, enddate, top5.total_cases, "Curvas de contagios acumulados de COVID-19 de los 5 países con más contagios reportados ")
Gráfica con los datos a partir de marzo:
plot.trend.total_cases("2020-03-01", enddate, top5.total_cases, "Curvas de contagios acumulados de COVID-19 de los 5 países con más contagios reportados de marzo a agosto")
plot.trend.total_cases("2020-06-01", enddate, top5.total_cases, "Curvas de contagios acumulados de COVID-19 de los 5 países con más contagios reportados de junio a agosto")
plot.trend.total_cases("2020-07-01", enddate, top5.total_cases, "Curvas de contagios acumulados de COVID-19 de los 5 países con más contagios reportados de julio a agosto")
La cantidad de contagios que tienen Estados Unidos y Brasil no me permiten apreciar las curvas del resto de los países. Quitamos a esos 2 países y agregamos a los 2 países que sigan en el top 20 de países con más contagios reportados.
total_cases.others <- as.character(top20.total_cases$location)[3:7]
plot.trend.total_cases("2020-07-01", enddate, total_cases.others, "Curvas de contagios acumulados de COVID-19 reportados de julio a agosto")
plot.trend.total_cases <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_cases, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle)+
ylab("Casos acumulados de COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
plot.trend.total_cases(enddate-14, enddate, total_cases.others, "Curvas de contagios acumulados de COVID-19 reportados las últimas dos semanas")
Quitamos a India y Rusia, para ver más a detalle las curvas de contagio acumulado de México, perú y Sudáfrica de las últimas dos semanas:
total_cases.others <- as.character(top20.total_cases$location)[3:7]
total_cases.others <- total_cases.others[c(-1,-2)]
countrieslabels <- paste(total_cases.others, collapse = ', ')
plot.trend.total_cases(enddate-14, enddate, total_cases.others, paste("Curvas de contagios acumulados de COVID-19 reportados las últimas dos semanas",countrieslabels))
plot.trend.total_deaths <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "month", date_labels = "%B")+
ggtitle(graphtitle)+
ylab("Muertes acumulados por COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
top5.total_deaths <- head(as.character(top20.total_deaths$location),5)
plot.trend.total_deaths(startdate, enddate, top5.total_deaths, "Curvas de muertes acumuladas de COVID-19 de los 5 países con más muertes reportadas en total ")
Gráfica con los datos a partir de marzo:
plot.trend.total_deaths("2020-03-01", enddate, top5.total_deaths, "Curvas de muertes acumuladas por COVID-19 de marzo a agosto de los 5 países con más muertes reportadas en total")
plot.trend.total_deaths("2020-06-01", enddate, top5.total_deaths, "Curvas de muertes acumuladas por COVID-19 de junio a agosto de los 5 países con más muertes reportadas en total")
plot.trend.total_deaths("2020-07-01", enddate, top5.total_deaths, "Curvas de muertes acumuladas por COVID-19 de julio a agosto de los 5 países con más muertes reportadas en total")
La cantidad de contagios que tienen Estados Unidos y Brasil no permiten apreciar las curvas del resto de los países. Quitamos a esos 2 países y agregamos a los 2 países que sigan en el top 20 de países con más contagios reportados.
total_deaths.others <- as.character(top20.total_deaths$location)[3:7]
plot.trend.total_deaths("2020-07-01", enddate, total_deaths.others, "Curvas de contagios acumulados de COVID-19 reportados de julio a agosto")
plot.trend.total_deaths <- function(startdate, enddate, countries, graphtitle){
tmp.df <- covid.df[covid.df$location %in% countries,]
tmp.df <- tmp.df[tmp.df$date >= startdate,]
tmp.df <- tmp.df[tmp.df$date <= enddate,]
ggplot(data=tmp.df , aes(x=date, y=total_deaths, group=location, colour=location)) +
geom_line(size=1) +
scale_color_discrete(name = "Países")+
scale_x_date(date_breaks = "day", date_labels = "%b %d")+
ggtitle(graphtitle)+
ylab("Muertes acumulados por COVID-19")+
xlab("Fecha")+
theme_bw()+
theme(title = element_text(size=14, face="bold", colour = "black"),
axis.text.y = element_text(size=11, face="bold", colour = "black"),
axis.text.x = element_text(size=11, face="bold", colour = "black"),
legend.position = "bottom",
legend.title = element_text(size = 14),
legend.text = element_text(size = 13),
legend.key.width = unit(1.5,"cm"))+
guides(colour = guide_legend(override.aes = list(size=2)))
}
plot.trend.total_deaths(enddate-14, enddate, total_deaths.others, "Curvas de contagios acumulados de COVID-19 reportados de las últimas dos semanas")
Las curvas de Francia, Italia y Reino Unido están planas, se ha estancado la cantidad de muertes atribuidas a COVID-19. Una posible razón es que la pandemia inició primero en Europa. Las curvas de México e India siguen en aumento. Quitamos esos países y agregamos otros en donde al parecer la cantidad de muertes atribuidas a COVID-19 aún sigue en aumento, y generamos la gráfica de las curvas de las últimas 2 semanas sin contar a Estados Unidos y Brasil:
total_deaths.others <- as.character(top20.total_deaths$location)[c(3,5,8,9,10)]
countrieslabels <- paste(total_deaths.others, collapse = ', ')
plot.trend.total_deaths(enddate-14, enddate, total_deaths.others, paste("Curvas de muertes acumuladas por COVID-19 reportadas las últimas dos semanas:",countrieslabels))